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ABSTRACT 


Classically,  the  invariant  property  of  maximum  likelihood  estimators 
has  been  limited  by  one-to-one  restrictions  on  the  transformation.  This 
thesis  defines  the  Induced  Likelihood  Function  and  develops  a  theorem 
which  may  be  used  to  extend  the  invariant  property  to  estimation  problems 
where  the  one-to-one  restriction  is  dropped.  It  is  shown  that  the  theorem 
is  applicable  to  the  k  dimensional  estimation  problem. 

Theorem: 

If  l)  f  is  a  function  such  that  S  is  mapped  into  and 
f(Q)  ^  f(9)  for  all  ©  in  S. 

2)  ()  is  a  transformation  such  that  S  is  mapped  into  S 

where  (()(©)  =  ©  and  <))(©)  =  ©  for  all  9  in  S. 

*  i—l/  *\ 

Define  an  inverse  on  S  such  that  ^  (©q;  =  ©  and 

(jf^©*)  »  ©  for  all  ©*  in  S*. 

3)  g  is  a  function  defined  by  g(©  )  =  f((j)  *(©  ))  such 

# 

that  S  is  mapped  into  E^ 

then  g(©*)  ^  g(©*)  for  all  ©*  in  S*. 
o 

The  writer  wishes  to  express  his  appreciation  to  Professor  P«  W, 
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proof  of  the  above  theorem  and  to  Professor  J.  R *  Borsting  for  his  sug¬ 
gestions  and  encouragement. 
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SECTION  I 


INTRODUCTION 


1*1  Statement  of  the  Problem 

The  invariance  property  of  maximum  likelihood  estimation  provides  a 
very  convenient  tool  for  statistical  application.  However,  its  use  is 
somewhat  limited  in  practical  applications  since  the  property  apparently 
has  only  been  shown  to  hold  when  the  transformation  of  the  parameter  space 
is  one-to-one.  This  investigation  evolved  as  a  follow-up  to  one  phase  of 
a  reliability  study  undertaken  by  Captain  W.  J*  Corcoran,  USN,  and 
Dr.  H.  Weingarten  of  the  Technical  Division  of  the  Special  Projects  Office 
of  the  U.  S.  Navy  and  Dr.  P.  W«  Zehna  of  C3IR  (l3)(l4).  This  SP  sponsored 
study  presented  several  estimators  for  the  parameters  of  a  conceptual  re¬ 
liability  model  based  on  the  multinomial  probability  distribution.  One 
of  the  proposed  estimates  was  "like”  a  maximum  likelihood  estimate  (mle) 
in  that  it  was  a  function  of  mle’s;  but  since  the  function  was  not  1-1, 
the  estimate  was  not  formally  called  a  mle.  Attempts  to  derive  distribu¬ 
tion  information  concerning  this  estimate  involved  very  complicated  equa¬ 
tions  and  these  difficulties  were  compounded  by  the  fact  that  under  current 
definitions  and  concepts,  maximum  likelihood  estimation  (MLS)  distribution 
theory  was  not  correctly  applicable.  It  was  felt  by  Dr.  Zehna  that  one 
of  the  primary  questions  that  had  to  be  answered  prior  to  further  work  on 
the  model  was,  "Does  the  invariance  property  of  MLE  apply  when  the  func¬ 
tional  relationship  is  not  1-1,  and  if  so,  under  what  conditions?" 
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1.2  Purpose  of  the  Study 

The  purpose  of  this  study  is  to  investigate  the  invariance  property 
of  maximum  likelihood  estimation  when  the  transformation  function  is  not 
one-to-one,  and  to  attempt  to  formalize  concepts,  definitions,  and  theorems 
that  are  applicable  in  this  and  similar  situations , 

1.3  Thesis  Scope  and  Organization 

Of  necessity,  it  is  assumed  that  the  reader  has  a  basic  familiarity 
with  the  theory  of  probability  and  statistics,  and  the  method  and  proper¬ 
ties  of  maximum  likelihood  estimation.  A  brief  review  of  some  of  the  more 
pertinent  concepts  of  point  estimation  along  with  a  summary  of  the  tech¬ 
nique  of  MLB  is  presented  in  chapter  two  to  provide  a  minimal  common  back¬ 
ground  and  to  assure  familiarity  with  the  notation  as  it  is  used  in  later 
discussions.  Appendix  one  contains  a  chronological  key  to  all  notation 
used  and  is  referenced  by  numbers  indicating  the  page  in  the  thesis  on 
which  the  notation  was  originally  introduced. 

In  chapter  three  the  invariance  property  of  MLB  is  discussed  and  con¬ 
cepts  and  theorems  are  developed  which  allow  the  present  theory  to  be  gen¬ 
eralized  and  extended.  Examples  are  liberally  used  to  emphasize  the  points 
under  discussion.  Chapter  four,  in  summary,  attempts  to  indicate  the  poss¬ 
ible  contributions  of  the  thesis,  and  suggests  possible  areas  for  further 
study. 
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SECTION  II 


MAXIMUM  LIKELIHOOD  ESTIMATION 


2.1  Estimation;  Basic  Concepts 

The  purpose  of  statistical  estimation  is  to  estimate,  on  the  basis 
of  an  observed  sample,  the  values  of  the  unknown  parameters  of  the  popula¬ 
tion  from  which  the  sample  originated.  Since  1763  when  Bayes  memoirs  were 
published  posthumously,  there  has  been  wide  controversy  and  discussion  con¬ 
cerning  the  various  estimation  techniques  and  the  properties  of  the  result¬ 
ing  estimates*  Over  the  years  several  of  these  descriptive  properties  or 
characteristics  have  emerged  as  "desirable*'  traits  of  estimators*  After 
presenting  some  concepts  and  definitions,  several  of  the  properties  usually 
related  to  maximum  likelihood  estimates  are  briefly  discussed* 


Symbol/Term 


Definition 


A  sample  or  outcome  of  observed 
values  of  the  random  variables 


The  parameters  of  an  experiment  - 
generally  indices  for  some  family 
of  probability  distributions 


parameter 


A  constant  of  a  probability  dis¬ 
tribution,  generally  unknown  in 
estimation  problems 


f(x;S) 


The  probability  density  function 
of  the  random  variable  X  with  para¬ 
meter  indexed  by  9,  denoted  pdf 


E(x) 


The  expectation  of  x 


estimator 


A  statistic;  a  rule  for  making 
an  estimate  of  a  parameter;  a 


function  of  the  observed  values  of 
the  random  variables.  An  estima¬ 
tor  is  derived  prior  to  sampling. 


3 


Symbol/Term 


Definition 


estimate  A  numerical  value  assigned  to  a 

parameter  of  a  distribution  on 
the  basis  of  evidence  from  samples; 
an  observed  value  of  an  estimator „ 

An  estimate  is  made  after  the  samp¬ 
ling.  (in  this  paper  estimate  will 
imply  "statistical  point  estimate" 
unless  otherwise  indicated.) 

The  distinction  between  the  parameter  and  its  estimate  is  an  import¬ 
ant  one.  The  true  parameter  value  is  fixed  and  unknown,  However,  with 
repetition  of  an  experiment,  the  sample  will  vary,  and  the  estimate  itself 
will  vary  and  will  have  a  probability  distribution.  Estimation  techniques 
are  derived  with  the  assumption  that  a  sample  is  representative  of  the  true 
population,  therefore,  the  parameter  estimate  is  subject  to  sampling  errors * 
The  possible  magnitude  of  sampling  error  is  an  important  consideration  and 
leads  to  interval  estimation  which  is  not  discussed  in  this  paper, 

2.2  The  Method  of  Maximum  Likelihood  Estimation 

The  method  of  moments,  introduced  by  Karl  Pearson  in  1894  was  the  earli¬ 
est  formal  technique  proposed  for  point  estimation.  Since  that  time  many 
estimation  procedures  have  been  devised,  the  best  known  of  which  are  the 
methods  of  minimum  chi  square,  Bayes,  Minmax,  least  squares,  and  maximum 
likelihood.  It  has  been  said  that  in  many  respects  the  introduction  of 
maximum  likelihood  estimation  marked  the  era  of  modem  statistical  theory,^ 
The  principle  of  maximum  likelihood  was  discussed  by  Gauss  prior  to  1880, 
but  R  A.  Fisher  formally  developed  maximum  likelihood  estimation  (MLE)  as 
a  technique  in  a  series  of  papers,  the  first  of  which  was  presented  in 
1921  (20). 

*D,  A.  Fraser,  Statistics,  an  Introduction,  John  Wiley  and  Sone,  Inc,, 
p.  224,  1958 
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Gauss  had  stated  the  concept  in  the  following  manner*  Assume  a  random  var¬ 
iable  (vector)  X  with  real  values  (x^,  .  .  .  x^)  where  the  pdf  of  X  is  a 
function  of  the  parameter(s)  indexed  by  0.  Let  0  have  a  known  or  assumed 
prior  distribution  with  range  a  to  b*  Then  the  posteriori  distribution  of 
9  given  X  =  x  is 

f(9,x) 

f(e|x)  =  — - »  c(x)f(s,x)' 

I  f(e,x)ae 

Ja 

Gauss  used  the  mode  of  the  derived  posteriori  distribution  as  an  estimate 

of  9.  This  value  is  what  is  commonly  known  today  as  the  maximum  likelihood 

2 

estimate,  the  value  of  9  which  maximizes  the  pdf  of  X  with  respect  to  9„ 

Fisher  in  his  development  derived  what  became  known  as  the  "Likelihood 
Function",  the  product  of  the  population  densities  for  each  value  in  the 
sample.  This  function  is  denoted  L(9)  where  L(9)  -  7T and  is  re¬ 
garded  as  a  function  of  9  for  fixed  x^  The  method  of  maximum  likelihood 
estimation  is  defined  by  maximizing  this  function  Since  the  logarithim 
is  a  monotonic  increasing  function,  L(9)  and  its  log  are  maximized  by  the 
same  value  of  9.  This  is  sometimes  convenient  since  manipulation  of  log 

L(9)  is  often  much  easier  than  working  with  the  function  directly 

The  procedure  for  determining  the  mle  of  9  is  as  follows: 

1)  Determine  the  pdf,  f(x;9) 

2)  Determine  L(9)  =  Tt f^jO)  and  express  as  log  L(9)  if  appropriate* 

5)  Determine  a  value  of  9  which  will  maximize  l(o)  This  value 

is  usually  found  by  setting  the  derivative(s)  of  the  likelihood  function 
with  respect  to  9  equal  to  zero  and  solving  the  ensuing  equation(s)  for  the 
o 

E,  L*  Lehmann,  Notes  on  the  Theory  of  Estimation,  University  of  Cali¬ 
fornia,  p.  1-9 »  1950 
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parameter  value(s)  when  conditions  exist  that  make  this  possible „  If  L(©) 
is  differentiable  and  has  ite  maximum  at  an  interior  point  of  the  range  of 
©,  the  point  at  which  L(©)  attains  this  maximum  is  the  mle  of  ©9  denoted  ©P 


and  the  "Likelihood  Equation"  is 


[is 


A 

©=© 


=  0. 


log  L(9) 

Setting  the  derivative  of  a  function  equal  to  zero  and  solving  in  terms 
of  a  parameter  doee  not  in  iteelf  guarantee  a  maximizing  value.  If  there 
ie  any  doubt  ae  to  the  authenticity  of  the  solution P  there  should  be  further 
investigation  to  verify  the  underlying  assumption 9  namely  that  the  likeli- 
hood  equations  generally  have  only  maximinizing  solutions 0  Lindgren  points 
out  that  this  is  usually  the  case  since  L(©)  is  a  product  of  probability 
densities  and  is  usually  bounded  above  and  continuous  in  Q? 

2o3  Desirable  Properties  of  Estimators 

There  are  many  waye  that  an  estimator  may  be  chosen.  Hopefully  sta¬ 
tistical  techniques  provide  the  tools  for  choosing  "good"  estimators.  To 
help  deecribe  what  is  meant  by  "good",  several  generally  desirable  proper¬ 
ties  or  characteristics  of  estimators  have  been  definedo  The  properties 
usually  associated  with  mle^  are  diecussed  below. 

l)  Unbiasedness i  This  property  is  concerned  with  the  distribu¬ 
tion  of  the  estimator.  An  estimator  ©(x^p  .  .  »  9  x^)  for  the  parameter 
©  is  said  to  be  unbiased  if  E(©)  =  ©.  Then*  the  bias  of  ©P  denoted  bP  is 
b  =  E(©)  -  ©o  Although  unbiasedness  is  a  desirable  trait*  it  is  by  no  means 
paramount.  Figure  1  shows  the  densities  of  three  estimators  of  9.  Although 
©^  and  ©2  are  both  unbiased,  ©^  is  obviously  the  best  estimator  of  the  three 
even  though  it  has  positive  or  right  bias.  It  is  apparent  that  unbiasedness 

^Bo  S,  Lindgren,  Statistical  Theory,  the  Macmillian  Company 9  p0  222  8 

I960. 
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considered  alone  does  not  guarantee  a  good  estimator  The  distributions 
variance,  and  sample  size  all  modify  the  bias0 

2)  Consistency;  Consistency  is  a  large  sample  property  of  an 
estimator.  An  estimator  is  said  to  be  consistent  if  its  probability  dis¬ 
tribution  concentrates  on  the  true  parameter  value  as  the  sample  size  be¬ 
comes  infinite.  That  is,  ©  is  consistent  if  p(  |©  -  ©  |<  6 )  =  1  as  ir-*oo 
for  every  £ >  0, 


A 

9 

4 

Figure  1,  Density  Functions  of  Three  Estimators  of  the  Parameter  © 

An  unbiased  estimator  is  consistent  if  its  variance  approaches  zero  as  the 
sample  size  approaches  infinity, ^ 

There  may  be  many  consistent  estimators  of  a  parameter.  Therefore,  as 
with  unbiasedness,  the  criterion  of  consistency  alone  does  not  guarantee  a 
useful  estimator,  although  consistency  is  usually  a  desirable  property, 

5)  Efficiency;  Efficiency  provides  a  criterion  for  comparing 
unbiased  estimates  of  a  parameter.  As  mentioned  previously,  once  it  is 

4 

A.  M.  Hood,  Introduction  to  the  Theory  of  Statistics,  McGraw-Hill 
Book  Company,  Inc,,  p,  149,  1950, 

5 

H.  Cramer,  Mathematical  Methods  of  Statistics,  Princeton  University 
Press,  p,  551,  1946 
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known  that  the  distribution  of  the  estimator  is  centered  on  the  true  value 
of  the  parameter,  the  variance  of  the  distribution  becomes  an  important 

/\  /V 

consideration o  If  9^  and  9^  are  estimates  of  0,  the  "Relative  Efficiency" 
of  9^  to  92  is  (A2/A1)  x  100^  where  =  E(©i  -  ©)2*  If  the  ratio  A^A^ 
ie  greater  than  one,  9^  may  be  considered  a  more  efficient,  and  therefore 

/*--  A  A 

perhaps  a  more  suitable  estimate  than  92*  If  9^  and  92  are  unbiased  esti¬ 
mates  of  9,  then  A^A^  is  a  ratio  of  variances  and  will  take  on  its  highest 

/v. 

values  when  9^  is  an  estimate  with  minimum  variance *  R*  A*  Fisher  proposed 
that  the  estimator  having  a  minimum  variance  in  large  samples  ehould  be 
called  "Efficient" o  This  idea  was  formalized  by  a  definition  very  similar 
to  the  following; 

A 

Definition:  9  is  said  to  be  an  efficient  estimator  of  ©  if: 

1)  VnC§  -  9)  approaches  N(0,CT2)  as  N  approaches  infinity* 

2)  for  any  other  estimator  $  for  which  approach- 

p*  2*  2  * 

ee  N(0,CT  )  ,  cr  Hr  cr  .  (The  efficiency  of  0  is 

(c r2  /cr2*)  x  1CK$.) 

The  Cramer-Rao  inequality^  may  be  used  to  find  the  limiting  value  of 

mean  square  deviations  (variances  for  unbiased  estimators)*  Efficient  es~ 

7 

timators  are  consistent  but  are  not  necessarily  unbiased  except  in  the  limit* 

4)  Sufficiency:  An  estimator  is  sufficient  if,  "it  contains  all 

0 

the  information  in  the  sample  regarding  the  parameter"*  that  ie,  it  utilizes 
all  of  the  pertinent  information  in  the  sample* 

6ibid,  p.  477 

7 

A*  M*  Mood,  Introduction  to  the  Theory  of  Statistics,  McGraw-Hill 
Book  Company,  Inc*,  p*  151,  1950 

8ibid 
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Definition;  ©  is  a  sufficient  estimator  of  ©  if,  given  the 
value  of  ©(x^,  .  *  •  t  x^),  the  conditional  distribution 
is  independent  of  the  parameter  ©* 

In  many  situations  the  evaluation  and  manipulation  of  conditional  dis¬ 
tributions  is  very  difficult,  however,  the  following  criterion  allows  de¬ 
termination  of  sufficiency  by  discerning  if  the  joint  density  function  can 
be  properly  factored* 

Theorem  2*1;  An  estimator  is  sufficient  if  and  only  if  the  pro~ 
bability  density  function  can  be  factored  into  two  functions 
g  and  h,  where  h  is  dependent  on  the  estimator  and  the  parame¬ 
ter  and  g  is  independent  of  the  parameter*  That  is,  §  is  suf- 
ficient  if  7Tf(Xi,  e)  =  g(xi,  .  .  .  ,  xj£)  h(e-,e). 

If  sufficient  statistics  exist,  it  has  been  shown  that  they  will  be  solu- 

9 

tions  of  maximum  likelihood* 

5)  Invariance;  This  property,  which  is  to  be  discussed  at  length 
in  chapter  three  is  usually  associated  with  maximum  likelihood  estimation* 

A 

The  property  implies  that  if  the  mle  of  ©  is  ©  and  certain  regularity  condi¬ 
tions  are  satisfied  a  mle  of  $(9)  is  <K©)«  That  is,  a  mle  of  a  function  of 
©  is  simply  the  function  with  the  value  of  ©  substituted  for  ©* 

Maximum  likelihood  estimates  are  usually  biased,  consistent,  efficient, 
invariant,  and  a  function  of  a  sufficient  statistic  if  one  exists*  Under 

A 

fairly  general  regularity  conditions  ©  is  asymptotically  normally  distri¬ 
buted,  has  finite  variance  with  limiting  value  =  l/l(©)  where 

9 

R*  A.  Fisher,  Contributions  to  Mathematical  Statistics,  John  Wiley 
and  Sons,  Inc*,  p*  224,  1958 
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l(Q)  m  ^  log  f(X*©)j  jt  and  therefore  is  asymptotically  Ffficient0^ 

No  other  Asymptotically  normally  distributed  sstimator  can  have  smaller  var» 
ianceo"*-^  If  an  efficient  statistic  exists  for  small  samples  (i0s<>  with  min= 
imum  variance) *  a  mle  with  bias  corrsction*  if  necessary*  will  be  ito^ 

This  follows  from  ths  fact  that  if  there  is  an  unbiased  efficient  estimate * 
ths  maximum  likelihood  method  will  produce  it«^  Similarly*  if  there  is  a 
sufficient  statistic  for  estimating  ths  true  parameter  valus*  any  solution 
of  the  likslihood  squation  will  bs  a  function  of  it0 

From  the  preceeding  summary*  it  can  be  seen  why  MLE  has  become  a  favored 
and  often  used  technique  in  the  field  of  statistical  estimation  Although 
each  of  the  estimation  tschniques  has  its  strong  points  and  proponents* 
(Psarson  hotly  defendsd  the  method  of  moments  as  "best"  (44))*  the  mle  is  gen¬ 
erally  expected  to  exibit  moire  of  the  desirable  properties  of  a  point  esti¬ 
mator  *  Still,  for  certain  instances*  depending  on  the  situation  and  problem 
at  hand,  ths  uss  of  other  estimation  techniquss  may  seem  more  logical  and/or 
bs  easier .  In  fact,  for  certain  distributions  different  techniques  may  pro¬ 
duce  the  same  estimate  although  generally  they  are  diffe rente  The  methods 
of  moments  and  maximum  likslihood  produce  the  same  estimates  for  the  parame¬ 
ters  of  the  normal*  poisson*  and  binomial  probability  distributions 

^Ho  Cramsr,  Mathematical  Methods  of  Statistics,  Princston  University 
Press*  pp0  500-506,  1946 

ilAc  M0  Mood,  Introduction  to  the  Theory  of  Statistics*  McGraw-Hill 
Book  Company,  Inc»,  p*  160,  1950 
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Rc  Lo  Andsrson  and  T*  A*  Bancroft,  Statistical  Theory  in  Research* 
McGraw-Hill  Book  Company,  Inc* ,  p0  102,  1952 

"^Bo  We  Lindgren,  Statistical  Theory,  The  Macmi Ilian  Company*  pc  226* 

I960 

^S*  S.  Wilks,  Mathematical  Statistics.  Princeton  University  Press* 

Pc  146,  1945 
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2.4  Review  of  the  Literature 


A  search  of  literature  failed  to  yield  any  new  and  significant  inf'orma- 
tion  concerning  the  application  of  the  invariance  principle  to  MLE.  Of  the 
college  level  statistics  text3  reviews,  17  contained  sections  on  point  esti¬ 
mations  Of  these,  only  five  discussed  invariance  as  associated  with  maxi¬ 
mum  likelihood  estimation,  and  eacn  of  these  was  restricted  by  the  condition 
that  the  functional  relationship  be  single  valued  or  one-to-one <>  It  is  in¬ 
teresting  to  note  that  Mood^  in  his  discussion  of  the  property  of  invariance 
as  applicable  to  MLS  states  that,  "  *  .  .  if  d  is  the  maximum -likelihood  es¬ 
timate  for  9,  and  if  u(9)  is  any  single-valued  function  of  9,  then  u(§)  is 
the  maximum-likelihood  estimate  for  u(©).M  However,  in  his  proof  of  this 
property,  it  ie  implicitly  assumed  that  an  inverse  function  ©  =  v(u)  is  de¬ 
fined  and  he  shows  that  the  mle  for  u  is  the  value  of  u  that  maximizes 
L(v(u))»  Then,  in  addition  to  the  necessity  of  having  a  single-valued 
function  for  the  property  to  be  applied  as  described  by  Mood,  the  inverse 
function  must  also  exist 0  But  even  when  the  function  is  single-valued  (but 
many  to  one,  of  course)  there  are  many  ways  to  define  an  inverse  function® 

As  shall  be  seen  below,  special  care  must  be  exercised  in  defining  such  an 
inverse.  It  aleo  illustrates  one  of  the  situations  motivating  this  investi- 
gation,  namely,  that  discussions  of  the  invariant  property  are  often  incom¬ 
plete  in  the  above  sense. 

A  conspicious  absence  of  literature  concerning  this  property  could  be 
construed  to  indicate  either  that  the  problem  is  so  trivial  that  it  is  un¬ 
necessary  to  record  methods  of  application,  or  that  the  problem  is  of  no 
practical  or  theoretical  interest.  Preliminary  investigation  of  the  problem 

^A.  M.  Mood,  Introduction  to  the  Theory  of  Statistics,  McGraw-Hill 
Book  Company,  Inc.,  p.  159 »  1952 
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at  hand  leada  to  rejection  of  both  alternatives 0  The  SP  study  mentioned  in 
the  opening  pages  of  this  paper  is  just  one  of  many  indicators  that  the  pro¬ 
blem  is  not  trivial*  Also,  it  provides  a  real  practical  application  of  the 

invariant  property  of  MLS  -  in  an  area  not  adequately  covered  by  present 

concepts  and  definitions • 
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SECTION  III 


A  NEW  APPROACH 


3ol  The  Induced  Livelihood  Function 

We  have  seen  that  previous  applicatione  of  the  invariance  principle  to 
the  method  of  maximum  likelihood  estimation  have  been  restricted  eo  as  to  pro- 
vide  a  1-1  relationship  between  the  domain  and  range  of  the  functions  of  the 
parametere  being  estimated,  Below  a  theorem  ie  stated  as  it  usually  occurs 
in  the  1-1  estimation  problem0 
Theorem  3,1: 


If  l)  f:S  — >  E^  (read,  the  function  f  map  S  into  E^) 

2)  4>?  S  S  ,  therefore  §  S  S  ■“pjfr'S 

3)  g:  S* — >  E1  defined  by  g(Q*)  =  f (^(d*)) 

4)  there  exists  an  element  of  S*  denoted  ©Q,  such  that 


f(©Q)  ^  f(©)  for  all  ©  in  S 
then  g(©Q)  *  g(®  )  for  all  ©  in  S 

Proof:  l)  Let  ©  be  an  element  of  S  0  Then  <p  (©  )  is  an 

element  of  S  and 

2)  g(e*)  =  f(fV»  ^  f(«0)  -  f(fx(e*))  =  g(9*), 

i0e.  g(©0)  -  g(©  )  for  all  ©  in  S  and  if  ©Q  ie 

* 

unique,  strict  inequality  holde  and  ©^  ie  unique. 

So  far  both  the  theorem  and  notation  are  conventional  and  application 
of  the  theorem  to  maximum  likelihood  estimation  ie  as  follows.  Let  S  be 
the  parameter  space  of  the  estimation  problem,  E^  is  the  real  line0  The 
likelihood  function  L(©)  ie  such  that  L:  S  — »  E1  and  L(©)  ^  L(©)  for  all 
©  an  element  of  S,  Suppose  there  exiete  a  function  (j)  euch  that  S  — S 
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eo  that  (fT1:  S  s°  Define  the  “Induced  Likelihood  Function P 

M(©  )  =  L((J>  1(©  )),  the  likelihood  function  induced  on  S*®  We  now  have  the 

essential  elemente  to  apply  theorem  3ol® 

1)  L;S— (N 
20<t:S^S*  ; 

3)  M:  S* - > 

4)  ©  is  the  value  of  ©  euch  L(©)  i  L(©)  for  all  ©  an  element  of  S 
Therefore  if  ©q  =  <]>(©),  by  theorem  3,1  M(©q)  -  M(©  )  for  all  9  in  S  and 
a  mle  of  ©q  is  (©) .  If  §  ie  unique,  then  ©^  ie  unique. 

Although  it  becomes  apparent  with  application f  let  it  be  emphasized  at 
this  point  that  the  concept  of  the  induced  likelihood  function  (ILF)  and 

the  manner  in  which  it  is  defined  is  a  most  important  element  of  the  appli= 

* 

cation  of  the  theorem*,  A  new  likelihood  function  ie  defined  on  S  and  the 

* 

mle  is  the  parameter  value  in  S  which  maximizee  this  new  function® 

Prior  to  looking  at  situation  in  which  the  1-1  condition  is  dropped P 
consider  the  following  interesting  example  which  emphasizes  the  importance 

of  the  definition  of  the  new  likelihood  function  on  the  transformed  parameter 

* 

space  S  ®  Let  all  of  the  essential  conditions  of  theorem  3°1  hold  and  let 

* 

S  be  contained  in  5®  The  theorem  etill  applies  and  along  with  conventional 
MLE  procedures  produces  <|)(@)  as  the  mle  of  That  isP  the  MLE  procedure 

on  S  is  carried  out  as  usual  and  produces  ©P  a  mle  of  9®  However P  in  this 

# 

case  L(©)  ie  defined  not  only  on  S  but  on  S  as  well®  ’What  happens  when 

* 

the  likelihood  function  is  restricted  to  S  ?  Naturally P  it  is  not  expected 

that  the  restricted  mle  will  alwaye  be  the  same  as  that  produced  in  the  un- 

* 

restricted  case  since  the  unrestricted  estimate  may  not  be  a  member  of  S  ® 
However,  the  interesting  fact  is  that  ()(©)  is  not  necessarily  the  restricted 

✓s  * 

mle  even  when  ©  is  an  element  of  S  ®  As  an  example  consider  the  exponential 
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distribution* 

1)  Let  f(x;9)  =  9e~^X  for  ©>0 

2)  L(e)  =  7T  =  ©V"50 

L1  (©)  =  n9n_1  e_nS9  (l-x0) 


3)  Let  *  =  $(©)  =  ^  ,  so  that  0  .  if)'1  (  >)  * 

_(_X_)* 

and  g(x;  X)  =  (j^)e  for  0<X<  1 

4)  Let  M(A)  *  L(©|o<  9<l)  for  the  restricted  estimation  problem* 

Then  9=4  for  x  ^  1  and  is  undefined  for  x<  1* 

5)  However,  if  M(X)  =  L(<tT*(  >\ ))  =  9  all  of  the  essential  con- 

ditions  of  theorem  3«1  are  fulfilled* 

1)  L(0):  S - ) 

2)  <Ke):  s-^iSs* 

3)  M(  X  ):  S* - »  Ex  is  defined  by  m(  X)  =  L($“l(  X)) 

4)  9  is  the  value  of  9  such  that  L(@)  ^  L(9)  for  all  ©  in  S, 

Therefore  by  theorem  3»1  M(  X  )  is  maximized  by  ^  =  (j)(9)  =  =  -j4=  and 

the  restricted  mle  (4  for  x  ^  l)  is  not  equal  $(&)  *  •  Had  S  not  been 

contained  in  S,  the  defining  of  the  ILF  would  be  absolutely  necessary  since 
L(9)  would  have  no  meaning  on  S  o 

Taking  note  of  the  use  of  the  1-1  property  in  conventional  maximum  like¬ 
lihood  estimation,  it  is  seen  that  the  assumption  that  {()  is  1-1  is  used  only 
in  defining  M(9  )  as  a  single  valued  function*  If  (j)  is  not  1-1,  how  may  the 
MLE  problem  be  handled?  As  before,  the  key  concept  is  the  characterization 
of  the  new  likelihood  function  and  it  can  be  shown  that,  with  proper  defini¬ 
tion  of  the  ILF,  it  is  still  maximized  at  §(&)<> 


15 


Consider  the  case  where  f  :  S - +  and  <j)  :  S  — S,  that  is,  the 

function  is  exaustive  but  not  necessarily  1-1,  The  ILF,  the  likelihood  func- 

*  ^ 

tion  induced  on  S  is  defined  in  the  following  manner  If  L(©)  ^  L(©)  for 

all  9  an  element  of  S,  then  let  9Qbe  any  value  of  <j>(©)o  Using  the  Axiom  of 
Choice,  if  necessary,  define  an  inverse  on  S  such  that  <jT^(©o)  s  ©  and  for 
any  other  9  in  S  ,  <p  (9  J  =  9  where  9  is  any  element  of  S  such  that 

$(9)  *  ©  ♦  Then  $  S - »  S*  Now  theorem  3*1  can  be  extended  and  stated 

in  a  more  general  form. 

Theorem  3*2: 

If  l)  f;  S - *  E1  and  f(©)  ^  f(©)  for  all  6  in  S 

2)  <(>:  s  ^  s*  <K§)  =  e* 

,-l  * 

and  (|)  :  S  - *  S  defined  as  above 

3)  g:  S  — *  defined  by  g(9  )  =  f(())”1(9*)) 

then  &(©0)  ”  g(©  )  for  all  ©  in  S 
Proof: 

*  * 

1)  Let  9  be  an  element  of  S 

2)  g(e‘)  =  f(<rV))  =  f(e)  ^  f(e)  =  f^e*))  =  «(«*) 

■*  -ft  *  * 

Thus,  g(9Q)  -  g(©  )  for  all  ©  811  element  of  S 

In  the  estimation  problem  let  M(9  )  =  L(<()  ^(9  ))*  Then  M(9q)  ^  M(©  ) 
so  M(©  )  is  maximized  by  9^  =  (f>(©) .  The  mle  of  9q  is  <|>(9)  just  as  in  the 
1-1  estimation  situation.  The  maximization  of  M(©  )  may  not,  in  effect, 
have  been  over  all  the  elements  in  S  since  (J>  1  is  not  onto  S,  but  it  has 

A 

taken  place  over  the  set  containing  ©  which  is  the  essential  factor.. 

Having  repeatedly  emphasized  the  importance  of  the  definition  of  M, 
the  ILF,  it  seems  reasonable  at  this  point  to  acknowledge  the  fact  that 
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there  may  be  many  ways  to  define  (j)  1  and  Mo  In  some  cases  the  definitions 
may  be  such  that  M  is  not  maximized  at  <()(©)  but  this  is  not  necessarily 
brought  on  by  dropping  the  1-1  restriction  and  in  fact  these  same  remarks 

apply  even  in  the  1-1  case*  In  the  restricted  exponential  estimation  examples 

* 

which  was  1-1,  two  likelihood  functions  were  defined  on  S  and  one  was  not 
maximized  at  <))(©)* 

Although  the  term  "likelihood  function"  has  been  used  extensively  in 
theoretical  statistics  for  quite  a  number  of  years 9  it  appears  that  the  term 
may  be  used  rather  loosely  unless  more  emphasis  is  placed  on  the  definition 
in  a  given  problem.  It  is  suggested  that  the  notion  of  ILF  may  be  an  idea 
which  will  help  to  emphasize  this  points 

3.2  Examples  of  the  Application  of  Theorem  3*2  and  the  Induced  Likelihood 
Function  to  Maximum  Likelihood  Estimation 
3.2.1  Geometric  distribution 

1)  Let  f(x;©)  a  ©(l-©)**1  0  £  Q  4  1 

2)  L(e)  =  ©n(i-©)n^x“1^ 

L*  (©)  =  n©11’1  (l-©)nx"n  -  ©n  n ( x-l ) ( 1-0 )ni*n-1 
0  =  n©11"1  {I-Qf™-1  [l— ©~(x-l)@] 


e  =  = 

X 

I 

9  for  0  ^  ^  i 

Let  X  »  $(©)  —  \ 

1-  ©  for  1  ^  ©  ^  1 

(|)  :  [o,l] - *  [°»t]  and  is  not  ] 

>>  if  x  ^  2 

Define  (|)”^(  A  )  = 

©  =(  ^ 

ll-A  if  x  ^  2 

•••  (J)'1:  [o.i]  - 

— »  t0*1] 
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5)  Let  M(A)  =  L(<t>-1(A)) 


Therefore  by  theorem  3°2 


I  L(  A)  if  x  *  2 
Il(I-A)  if  x  £  2 


a  =<KS) 


e  =  ■=  if  x  &  2 
l-§  =  1-  4  if  x  £  2 

I 


6)  Checking  the  results  directly 

A 

for  x  ^  2  MIA)  a  L()i)  therefore  ^ 

for  i  i  2  M(  A)  =  L(l~)\) 

M(A)  =  (l-A)n  [l-(l-A  )]  n^x-1) 

H'(A)  =  -n(l-  A  )n_1  +(!->,; 


0  =  n(l-A) 
1  =  x(l-  X ) 


A  »  l-  a 

X 


n-1  ^  nx-n-1 


[->  +  (1-^)0 


e*4’(x"©)  -00<  @  < 


3*2.2  Normal  Distribution 
l)  Let  f(x;S)  = 

2  L(S)  =  (gV)"  2  e“^xi  " 

1(e)  =  2  e"^(xr  8)  [£>*  -  e)] 

0  =  JY  -  n9 

^  i 


9  =  x 


3)  Let  A  =  <j)(9)  =  9^ 


:  E, 


[o.oo) 


and  is  not  1-1 


1°  1 

«>i 


oo 


18 


.  V>T  if  x  i  0 

4)  Define  <f>-1  (  A  )  =  e  =  {  . — 

-V  A  if  x  <  0 

"  f1  :  [O.oo]  - >  El 

L(Va  )  if  2^0 

L {^[X  )  if  x  <  0 

Therefore  by  theorem  3c 2 

X  -  0(e)  =  e2  =  X2 

6)  Checking  the  results  directly 

if  x  ^  0,  M(A)  =  HVT)  *=  2  e"^(xi 

andV~>T  =  x  /.  X  =  x2 

if  x  <  0,  M(  A  )  =  L(-V7 )  =  (jV)'  2  W  +VA  )2 

M'  (  A  )  =  2  e"^(xi  +’^")  [  Y.  +VA  )] 

0  a  =  nVX 

VT  -  -x  X  =  x2 

3.2.3  Binomial  Distribution 


1) 

Let  f(x; 

;S)  =  ex(i  - 

•  e)1-1 

x  =  Op 

1,  for  0<  9  <  1 

2) 

l(@)  =  enx  (l  -  e)E 

l(l-x) 

L'(e)  = 

nxQnX_1  (1 

-  e)n_nx 

-n(l  - 

x)en5  (i-e)1^1 

= 

ne11*-1  (1  - 

.  ef-™-1 

[x(l  . 

-©)-©  +  exj 

o  = 

x(i-§)  -  e 

+  es 

e  =  x 


5)  Let  M(A)  =  L(<t>_1(  A)) 
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3)  Let  *  =  4(e)  = 

(j>  :  (0,  1) 


26  for  0  <  @  £  ■£ 

2  -  26  for  i<  6  <  1 
-)  (Os  l)  but  is  not  1-1 


4)  Define  ijl”1  ( X )  =  6  = 


|  if 


2-X 


if  i  X  £ 


i-l 


:  (0,  1) 


(0,  1) 


5)  Let  M(X)  =  L(<J>_1  (X))  = 


Lf^)  if 

L(^)  if  x  >  i 


Therefore  by  theorem  3*2 

a  .  I  29  =  a  if  x  4  1 

x  *  <K«)  -  { 

]  2  -  26  -  2(l-x)  if  x  >  i 
6)  Checking  the  results  directly 

Ifi4.  M(  X)  =  l(^-)  =  (^)nx  (l  -  -|)n(l~x) 

M'(X)  =  nxt-|)“_1  (1  -  -^)n-“  [  x(l  -^)  -  -^(1  -  x)] 

A 

0  =  x  - 


A  =  2x 


If  x  >  h  M(  X )  =  L(^)  =  (^)“  [  1  -(^)]  n(l-f) 

H' (  X)  =  nx^)^'1  [  l-(^-)]n-“  [x(l-  ^)-<^)<l~x 

0  =  5-  (^) 

X  -  2  -  2x  =  2(l  -  5) 
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5.3  The  Multidimensional  Estimation  Problem 

At  this  point,  it  seems  logical  to  consider  how  the  theorem  applies  in 
the  estimation  problem  with  a  multidimensional  parameter  space.  All  examples 
considered  to  thie  point  have  been  one-dimensional u  However 9  since  restric¬ 
tions  on  dimensionality  of  the  parameter  space  do  not  occur  in  the  theorem 
or  its  proof,  it  follows  that  the  theorem  applies  to  the  multidimensional 
estimation  problem.  Let  ©  =  (©^,  ©2>  *  <>  .*6^)0  If  £  ie  multidimensional, 
then  so  is  ©  and  the  components  (©^  ©2p  .  .  .  *  ©k)  are  said  to  be  the 
joint  maximum  likelihood  estimates  of  the  corresponding  9^. 

Consider  the  following  example  of  the  normal  distribution  withx  =  9^P 
<J^  =  ©2  and  k  =  2. 

,  »  (x-Ot)2 

1)  f(x;©)  =  f(x;  ©. ,  ©  )  --7=*—  e"7  © 

V2TT© 


2)  l(©)  =  (jfr) 


lri 


(02) 


-f  .4; 


ft 


@1> 


=  0 


®1  =  x  =  M 


Ak _ n  1  r(x_g)2=0 

a°2  V  ^  V 


iE(xi-i)2  =  S: 


2  2 
C T 


2  V1 

5)  Let  2h  =  ^  =  55  ^9lf  92 ^  =  ^91S 

(j)  is  not  1-1 
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(V\.  if  x  i  0 

4)  Define  (jf^A.)  =  (6.,  0,)  =  {  - -  2 

1  2  |  <-V\,  if  x  ^  0 

[  L(V\,  ^-)  ifiiO 

5)  Define  «(2l)  =  LC^CA))  =  < 

iff-° 

Therefore  by  theorem  %2 

!-$(§)  =  <Kv  s2)  -  (c2,  “) 

°2 

6)  Checking  the  results  directly 

if  I  i  o,  n(A)  =  l(V^7» 


=  <lR2  exP 


.£<A5>: 


ft  =Kfci-nV\  ] 


\  -2  *2 

\  "  x  =  91 


d  M  _  3 
3^2  3^2 


0  = 


( 1  -  f  ■*  -  *R\  -vr/  ^i'1] 

_  (xi-VX1)2 

♦  L  ^ - "o 


(TT^)2 


1-  n 


-*E  (x;  -X)2=S2 


S2-1  V1 
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similarity  if  x  ^  0 


A  =  (\.  *2) 


(x2, 


(e2, 


In  some  situations  we  may  de3ire  to  estimate  only  a  portion  of  the  ©^  „ 

It  should  be  noted  that  even  though  the  estimates  of  only  certain  components 
of  ©  are  desired,  it  may  be  necessary  to  estimate  the  remaining  parameters 
since  the  maximizing  values  for  the  desired  set  usually  depend  on  the  remain- 
ing  parameters  This  characteristic  is  demonstrated  in  the  example  just 
completed  where  the  mle  of  the  variance  depends  on  the  mle  of  the  mean. 

In  estimating  only  certain  of  the  components  of  ©  when  the  remaining 
parameters  are  unknown,  theorem  3.2  i3  applied,  However „  if  some  of  the  re¬ 
maining  parameters  are  known,  then  the  problem  is  quite  different  and  the 
dimension  of  the  parameter  (estimation  space  is  reduced  by  one  for  each 

known  parameter  value.  The  problem  of  estimating  the  variance  of  a  normal 

2 

distribution  with  parameters  M.  »  ^  and  CT  =  ©2  serves  to  illustrate  this 
point. 

Case  I  :  M  »  cr?  unknown 

__  /\  2 

We  have  seen  that  ©^  a  x  and  ©2  =  S  «  In  this  case*  the  parameter 

3pace  i3  two-dimensional,  a  half-plane,  That  is  L  :  S  - E^  where  5  is 

E1  X  (0,oo) . 

2 

Case  II  :  known,  CT  unknown  2 

i  (x-,q) 

In  this  case  f(x;©)  =  — ©2  *  Since  M  is  known,  L  is 

V2TT©2  a  i  _  2 

a  function  of  ©2  onty  and  it  is  well  known  that  ©2  =  ~  -  XjL  )  o  In 

this  case  the  problem  is  no  longer  to  estimate  a  component  of  a  two-dimen¬ 
sional  ©,  rather  we  have  a  new  one-dimensional  estimation  problem  where  S 
is  a  subset  of  E^. 
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Note  that  Case  I  produced  ®2  =  n  Case  ^  Produces 

a  1  r>  ,  n2 

02  =  “  2-  )  t  usually  quite  different  results0  These  differences p 

however,  are  not  due  to  the  application  of  the  theorem,  but  result  from  the 
fact  that  the  two  likelihood  functions  are  different . 
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SECTION  IV 


SUMMARY 


4=1  Summary  of  Findings 

The  objective  of  this  study  was  to  investigate  and  formalize  concepts 
and  definitions  that  would  allow  the  invariant  property  of  MLE  to  be  extend- 
ed  beyond  the  usually  assumed  1-1  estimation  situation.  The  induced  likeli¬ 
hood  function  was  introduced,  and  it  has  been  shown  that  by  properly  defin¬ 
ing  the  ILF,  theorem  5.2  provides  the  tool  for  applying  the  invariance  prin¬ 
ciple  in  the  estimation  problem  with  a  transformation  which  is  not  l-lo 
The  theorem  was  shown  to  be  equally  applicable  in  the  1  or  k  dimension  esti¬ 
mation  situation* 

In  the  development  of  theorem  3*2  it  has  been  strongly  emphasized  that 

the  power  of  the  technique  lies  in  the  defining  of  the  new  likelihood  func^ 

* 

tion,  the  likelihood  function  induced  on  S  •  It  is  felt  that,  in  the  past* 
not  enough  emphasis  has  been  focused  on  this  induced  likelihood  function * 

4.2  Proposed  Areas  for  Further  Study 

This  study  has  not  attempted  to  investigate  the  distribution  theory  re- 
lated  to  the  mle's  =  <j>(9)  derived  using  the  ILF  Certainly*  it  is  im¬ 
portant  to  know  if  present  mle  distribution  theory  is  still  applicable  in 
the  unrestricted  estimation  situation  Therefore*  it  is  suggested  that  an 
area  which  presents  fertile  ground  for  study  is  mle  distribution  theory  in 
the  new  situations  covered  in  this  study. 

The  examples  presented  in  this  investigation  are  simple  and  are  in¬ 
tended  merely  to  acquaint  the  reader  with  the  proposed  use  of  the  theorem 
and  the  ILF.  It  is  hoped  that  this  study  has  generated  reader  interest 
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which  will  result  in  application  of  the  induced  likelihood  function  and 
the  associated  theorem  in  a  wide  variety  of  estimation  situations c 
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APPENDS*  ONE 


SYMBOLS  AND  ABBREVIATIONS 

Symbol 

Definition 

mle 

Maximum  likelihood  estimate 

1-1 

one-to-one 

MLE 

maximum  likelihood  estimation 

X. 

1 

observed  value  of  random  variable  X. 

i 

0. 

l 

index  for  i^  parameter 

f(x;e) 

probability  density  function 

pdf 

probability  density  function 

E(x) 

the  expectation  of  x 

( x^  t  *2 *  °  •  °  i 

,  xn)  a  sample  or  observed  outcome 

f(9  1  x) 

the  conditional  pdf  of  9  given  X=x 

L(9) 

the  likelihood  function 

5(i^i  i2»  •  •  • 

,  x^)  an  estimator 

f:  S - *  E 

the  function  f  is  such  that  it  maps 

S  into  E 

E1 

the  real  line*  Euclidean  1-space 

♦» 

the  function  <j)  is  such  that  it  maps 

S  onto  S  (onto  implies  MexaustiveM) 
and  is  1~1* 

H(  ) 

the  induced  likelihood  function 

ILF 

the  induced  likelihood  function 

[0.  I] 

the  closed  interval  0,  1 

[°’  l) 

the  half-closed  interval  09  1 

(°’  l) 

the  open  interval  0,  1 

£ 

f  t  0  0  •  9 

Fag©  on  which  symbol  originally  was  introduced 


